[TRAINIUM] improve support by mayank31398 · Pull Request #421 · open-lm-engine/lm-engine

mayank31398 · 2026-04-29T23:58:37Z

No description provided.

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

This reverts commit 0b7b3df.

This reverts commit 0afe9ab.

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

gemini-code-assist

Code Review

This pull request introduces comprehensive support for AWS Trainium accelerators, including specialized data type handling (int32), dynamic selection of compilation backends, and integration with the Neuron profiler. It also implements efficient model initialization and refactors positional embedding logic within the dense model mixins. Feedback focuses on correcting the usage of nullcontext, addressing potential RuntimeError and TypeError exceptions in the autocast and profiling logic due to platform-specific arguments, and improving the portability of utility scripts by removing hardcoded file paths.

gemini-code-assist · 2026-04-30T00:01:14Z

+                device_type=Accelerator.get_device_type(), dtype=string_to_torch_dtype(args.mixed_precision_args.dtype)
+            )
+            if args.distributed_args.fsdp_algorithm is None
+            else nullcontext


nullcontext is a class and must be instantiated (i.e., nullcontext()) to be used as a context manager. Using the class itself in a with statement will raise a TypeError because the class does not implement the context manager protocol (__enter__/__exit__) as class methods.

Suggested change

else nullcontext

else nullcontext()

gemini-code-assist · 2026-04-30T00:01:14Z

+        enable_kernels(args.kernel_args.kernels),
+        (
+            torch.autocast(
+                device_type=Accelerator.get_device_type(), dtype=string_to_torch_dtype(args.mixed_precision_args.dtype)


torch.autocast does not natively support device_type='xla' or 'neuron' in standard PyTorch. This will cause a RuntimeError when running on TPU or Trainium if the environment does not have a specifically patched version of PyTorch. For these accelerators, it is generally recommended to use device_type='cpu' (which is how Neuron AMP is typically triggered) or the accelerator-specific autocast context (e.g., torch_xla.amp.autocast).

gemini-code-assist · 2026-04-30T00:01:14Z

                    repeat=1,
                ),
-                on_trace_ready=torch.profiler.tensorboard_trace_handler(path),
+                experimental_config=experimental_config,


The experimental_config parameter is not part of the standard torch.profiler.profile signature in PyTorch. While it is supported by torch-neuronx, passing it (even as None) will cause a TypeError on standard PyTorch installations (e.g., when running on CUDA or CPU). To maintain cross-platform compatibility, consider using a conditional approach or dictionary unpacking to call profile without this argument on non-Trainium devices.

gemini-code-assist · 2026-04-30T00:01:14Z

@@ -0,0 +1 @@
+ssh -i ~/Desktop/mayank-melbourne.pem -L 8001:localhost:3001 -L 8002:localhost:3002 trainium-melbourne -fN


The script contains a hardcoded absolute path to a personal PEM file (~/Desktop/mayank-melbourne.pem). This makes the script non-portable and potentially exposes details about your local file system. It is recommended to use an environment variable or a generic placeholder.

Suggested change

ssh -i ~/Desktop/mayank-melbourne.pem -L 8001:localhost:3001 -L 8002:localhost:3002 trainium-melbourne -fN

ssh -i ${PEM_FILE:-/path/to/your/key.pem} -L 8001:localhost:3001 -L 8002:localhost:3002 ${REMOTE_HOST:-trainium-melbourne} -fN

mayank31398 added 29 commits February 1, 2026 01:01

init

4f90813

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

drop muon

0fb830e

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

Merge branch 'main' into n

53293af

drop muon

96a4b68

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

drop muon

badd3bc

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

drop muon

15d10c5

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

b0d6d48

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

2fe68ea

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

0afe9ab

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

0b7b3df

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

79dc7fd

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

2e7cb56

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

3c6999d

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

a6897f2

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

70cb9f5

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

13b11cb

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

141cc16

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

Revert "init"

8d92cc9

This reverts commit 0b7b3df.

Revert "init"

b76b5f1

This reverts commit 0afe9ab.

init

3a83e97

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

35a73fa

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

init

ed15119

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

fix

61d21c8

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

add

7dc9259

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

add

46ef7b5

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

add

ebc3eb8

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

add

00277ee

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

add

a222025

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

add

9bfaa06

Signed-off-by: Mayank Mishra <mayank31398@gmail.com>

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TRAINIUM] improve support#421

[TRAINIUM] improve support#421
mayank31398 wants to merge 29 commits intomainfrom
n

mayank31398 commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		@@ -0,0 +1 @@
		ssh -i ~/Desktop/mayank-melbourne.pem -L 8001:localhost:3001 -L 8002:localhost:3002 trainium-melbourne -fN

	ssh -i ~/Desktop/mayank-melbourne.pem -L 8001:localhost:3001 -L 8002:localhost:3002 trainium-melbourne -fN
	ssh -i ${PEM_FILE:-/path/to/your/key.pem} -L 8001:localhost:3001 -L 8002:localhost:3002 ${REMOTE_HOST:-trainium-melbourne} -fN

Conversation

mayank31398 commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant